Size-based disciplines for job scheduling in data-intensive scalable computing systems. (Disciplines basées sur la taille pour la planification des jobs dans data-intensif scalable computing systems)
نویسنده
چکیده
The past decade have seen the rise of data-intensive scalable computing (DISC) systems, such as Hadoop, and the consequent demand for scheduling policies to manage their resources, so that they can provide quick response times as well as fairness. Schedulers for DISC systems are usually focused on the fairness, without optimizing the response times. The best practices to overcome this problem include a manual and ad-hoc control of the scheduling policy, which is error-prone and di cult to adapt to changes. In this thesis we focus on size-based scheduling for DISC systems. The main contribution of this work is the Hadoop Fair Sojourn Protocol (HFSP) scheduler, a size-based preemptive scheduler with aging; it provides fairness and achieves reduced response times thanks to its size-based nature. In DISC systems, job sizes are not known a-priori: therefore, HFSP includes a job size estimation module, which computes approximated job sizes and re nes these estimations as jobs progress. We show that the impact of estimation errors on the size-based policies is not signi cant, under conditions which are veri ed in a system such as Hadoop. Because of this, and by virtue of being designed around the idea of working with estimated sizes, HFSP is largely tolerant to job size estimation errors. Our experimental results show that, in a real Hadoop deployment and with realistic workloads, HFSP performs better than the built-in scheduling policies, achieving both fairness and small mean response time. Moreover, HFSP maintains its good performance even when the cluster is heavily loaded, by focusing the resources to few selected jobs with the smallest size. HFSP is a preemptive policy: preemption in a DISC system can be implemented with di erent techniques. Approaches currently available in Hadoop have shortcomings that impact on the system performance. Therefore, we have implemented a new preemption technique, called suspension, that exploits the operating system primitives to implement preemption in a way that guarantees low latency without penalizing low-priority jobs.
منابع مشابه
Data Replication-Based Scheduling in Cloud Computing Environment
Abstract— High-performance computing and vast storage are two key factors required for executing data-intensive applications. In comparison with traditional distributed systems like data grid, cloud computing provides these factors in a more affordable, scalable and elastic platform. Furthermore, accessing data files is critical for performing such applications. Sometimes accessing data becomes...
متن کاملSelecting models for capturing tree-size effects on growth–resource relationships
Subject trees included in growth analyses often vary in their initial size, possibly obscuring the effects of growth factors. We compare methods for incorporating size effects into growth models. For four different tree species, red maple (Acer rubrum L.), sugar maple (Acer saccharum Marsh.), American beech (Fagus grandifolia Ehrh.), and red oak (Quercus rubra L.), we compared models of radial ...
متن کاملFormal Semantics of Array-OL, a Domain Specific Language for Intensive Multidimensional Signal Processing
In several application domains (detection systems, telecommunications, video processing, etc.) the applications deal with multidimensional data. These applications are usually embedded and subjected to real-time and resource constraints. The challenge is thus to provide efficient implementations on parallel and distributed architectures. Array-OL has been designed specifically to handle this ki...
متن کاملUnités d'indexation et taille des requêtes pour la recherche d'information en français
RÉSUMÉ. Dans cet article, nous nous intéressons à la recherche d’information en Français. Nous analysons différentes techniques d’indexation (basées sur des lemmes, des radicaux ou des termes) et leur fusion. Nous analysons également l’influence de la prise en compte des différentes parties d’une requête. Notre étude porte sur 6 campagnes d’évaluation de CLEF Français. Nous montrons que l’utili...
متن کاملEquilibrium in Size-Based Scheduling Systems
Size-based scheduling is advocated to improve response times of small flows. While researchers continue to explore different ways of giving preferential treatment to small flows without causing starvation to other flows, little focus has been paid to the study of stability of systems that deploy size-based scheduling mechanisms. The question on stability arises from the fact that, users of such...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014